Performance bounds for linear stochastic control

نویسندگان

  • Yang Wang
  • Stephen P. Boyd
چکیده

Wedevelop computational bounds onperformance for causal state feedback stochastic controlwith linear dynamics, arbitrary noise distribution, and arbitrary input constraint set. This can be very useful as a comparison with the performance of suboptimal control policies, which we can evaluate using Monte Carlo simulation. Our method involves solving a semidefinite program (a linear optimization problem with linear matrix inequality constraints), a convex optimization problem which can be efficiently solved. Numerical experiments show that the lower bound obtained by our method is often close to the performance achieved by several widely-used suboptimal control policies, which shows that both are nearly optimal. As a by-product, our performance bound yields approximate value functions that can be used as control Lyapunov functions for suboptimal control policies. © 2008 Elsevier B.V. All rights reserved. 1. Linear stochastic control We consider a discrete-time linear time-invariant system (or plant), with dynamics x(t + 1) = Ax(t) + Bu(t) + w(t), t = 0, 1, . . . , (1) where x(t) ∈ Rn is the state, u(t) ∈ Rm is the control input, w(t) ∈ Rn is the process noise or exogenous input, A ∈ Rn×n is the dynamics matrix, and B ∈ Rn×m is the input matrix. We assume that w(t), for different values of t , are zero mean IID. We will also assume that x(0) is random, and independent of all w(t). We consider causal state feedback control policies, where the current input u(t) is determined from the current and previous states x(0), . . . , x(t), i.e., u(t) = φt(x(0), . . . , x(t)), t = 0, 1, . . . , where φt : R(t+1)n → Rm. The collection of functions φ0,φ1, . . . is called the control policy. For the problem we will consider, it can be shown that there is an optimal policy that is time-invariant and depends only on the current state, i.e., has the form u(t) = φ(x(t)), t = 0, 1, . . . , (2) where φ : Rn → Rm. We will refer to φ as the state feedback function, or the control policy. For fixed state feedback function ∗ Corresponding author. E-mail addresses: [email protected] (Y. Wang), [email protected] (S. Boyd). 1 This material is based uponwork supported by the Precourt Institute on Energy Efficiency, by NSF award 0529426, by NASA award NNX07AEIIA, by AFOSR award FA9550-06-1-0514, and by AFOSR award FA9550-06-1-0312. φ, the Eqs. (1) and (2) determine the state and control input trajectories as functions of x(0) and the process noise trajectory. Thus, for fixed choice of state feedback function, the state and input trajectories become stochastic processes. We now introduce the objective function, whichwe assume has the form J = lim sup T→∞ 1 T E T−1 ∑ t=0 ("x(x(t)) + "u(u(t))) , (3) where "x : Rn → R is the state stage cost function, and "u : Rm → R is the input state cost function. (We assume that the expectations exist.) The objective J is the average stage cost. Finally, we impose the control input constraint u(t) ∈ U (a.s.) , t = 0, 1, . . . , (4) whereU ⊆ Rm is a nonempty constraint set with 0 ∈ U. The stage cost functions "x and "u, and the input constraint set U, need not be convex. We can now describe the stochastic control problem. The problem data are A, B, the distribution of w(t), the stage cost functions "x and "u, and the input constraint set U; the optimization variable is the state feedback function φ. The stochastic control problem is to find the state feedback function φ that minimizes the objective J , among those that satisfy the input constraint (4). We will let J denote the optimal value of J , and we let φ denote an optimal state feedback function. For more on the formulation of the linear stochastic control problem, including technical details (e.g., finiteness of J, existence and uniqueness of an optimal state feedback function), see, e.g., [5,6,26,2,13,27]. 0167-6911/$ – see front matter© 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2008.10.004 Y. Wang, S. Boyd / Systems & Control Letters 58 (2009) 178–182 179 The stochastic control problem can be effectively solved in only a few special cases. The most famous example (described in Section 2 in more detail) is when U = Rm (i.e., there are no constraints on the input) and "x and "u are convex quadratic functions [14]. In this case the optimal state feedback function is linear, i.e., u(t) = Kx(t), where K ∈ Rm×n can be effectively computed from the problem data. 1.1. Suboptimal control policies Many methods can be used to find a suboptimal state feedback function, i.e., one with (one hopes) a small value of J . We describe three methods in this section; many others can be found in the literature. 1.1.1. Projected linear state feedback Perhaps the simplest form is a projected linear state feedback, φplsf(z) = P (Kplsfz), (5) where Kplsf ∈ Rm×n is a gain matrix (to be chosen), and P is projection onto U. When U is a box, i.e., U = {u | ‖u‖∞ ≤ Umax}, projection is the same as entry-wise saturation, so the projected linear state feedback policy has the form φplsf(z) = Usat((1/U)Kplsfz), where the sat function is defined for scalar argument as sat(a) = { a |a| ≤ 1 1 a > 1 −1 a < −1, and extended to vectors by acting entry-wise. (Projected linear state feedback is sometimes called saturated linear state feedback in this case.) 1.1.2. Control-Lyapunov feedback A more sophisticated state feedback function is given by φclf(z) = argmin v∈U ("u(v) + EVclf(Az + Bv + w(t))) , (6) where Vclf : Rn → R (which is to be chosen) is called a controlLyapunov function [11,22,12,23]. (The optimal control has this form, for a particular choice of Vclf, called the value function or Bellman function for the problem.) When Vclf is quadratic, the controlLyapunov policy (6) can be simplified to φclf(z) = argmin v∈U ("u(v) + Vclf(Az + Bv)) . (7) 1.1.3. Certainty-equivalent model predictive control An even more sophisticated feedback control function is given by certainty-equivalent model predictive control (MPC) [16,21,10,15, 4,19], in which φ(z) is found by solving the optimization problem minimize Vmpc(x̃(T )) + T−1 ∑ τ=0 ( "x(x̃(τ )) + "u(v(τ )) ) subject to x̃(τ + 1) = Ax̃(τ ) + Bv(τ ), τ = 0, . . . , T − 1 v(τ ) ∈ U, τ = 0, . . . , T − 1 x̃(0) = z, (8) with variables v(0), . . . , v(T − 1), x̃(0), . . . , x̃(T ). The function Vmpc : Rn → R is the terminal cost (to be chosen), and T is the horizon (also to be chosen). Let v(0), . . . , v(T − 1), x̃(0), . . . , x̃(T ) be a solution of this problem. The MPC policy is φmpc(z) = v(0), which is a (complicated) function of z through the optimization problem (8). As the horizon T becomes larger, the choice of Vmpc becomes less and less important. When the horizon is T = 1, the MPC policy reduces to the control-Lyapunov policy (7), with Vmpc = Vclf. 1.1.4. Parameters in suboptimal control policies The art in finding a good suboptimal control policy is in choosing good values for the parameters that appear in them. For projected state feedback, the gain matrix Kplsf must be chosen; for a control-Lyapunov policy, Vclf must be chosen; and for MPC, the terminal cost function Vmpc (and horizon T ) must be chosen. A common choice for Vclf or Vmpc is the (quadratic) value function for a related linear stochastic control problemwith no constraints and quadratic stage cost; Kplsf can be chosen as the associated optimal gain matrix. These methods (as well as many others) can give suboptimal state feedback functions that give good performance, i.e., a low value for J . (The objective J is typically evaluated by stochastic simulation, e.g., Monte Carlo.) A natural question that arises is: how close to optimal are these suboptimal control policies? In other words, how much larger than the optimal performance J is J , the performance obtainedwith a suboptimal control policy? To answer this question,we need to compute a lower bound on J, i.e., a bound on achievable performance over all feasible state feedback control functions. 1.2. Performance bounds In this paper we show how a numerical lower bound on J can be effectively computed, using convex optimization, from the problem data. Our bound is not a generic one, that depends only on the problem dimensions and general assumptions about "x, "u, and U; instead, it is computed for each specific problem instance. We cannot (at this time) guarantee that the bound will be close to J. But in a large number of numerical simulations, we have found that our bound is often not too far from the performance achieved by a suboptimal control policy. It is very valuable knowledge in practice to know that a proposed suboptimal control policy attains a specific cost J (found by Monte Carlo simulation), and that the optimal value of the stochastic control problem must exceed a known lower bound J lb (found by the method described in this paper). If the gap between the two is small, we can be certain that our suboptimal control policy is nearly optimal (and that our bound is nearly tight) for this problem instance. The gap can be large, of course, for two reasons: our suboptimal controller is substantially suboptimal, or, our lower bound is poor (for this problem instance). 2. Linear quadratic control It is well known that the linear stochastic control problem can be effectively solved when U = Rm (i.e., there are no constraints on the input) and the stage cost functions have the form "x(z) = zTQz, "u(v) = vTRv, where Q * 0, R * 0 (meaning, they are symmetric positive semidefinite). In this sectionwe (briefly) review these results, since our bound relies on them. Formore detailed discussion of the linear quadratic stochastic control problem, see, e.g., [5,6,26]. The optimal cost is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic bounds for a single server queue with general retrial times

We propose to use a mathematical method based on stochastic comparisons of Markov chains in order to derive performance indice bounds‎. ‎The main goal of this paper is to investigate various monotonicity properties of a single server retrial queue with first-come-first-served (FCFS) orbit and general retrial times using the stochastic ordering techniques‎.

متن کامل

Performance bounds and suboptimal policies for linear stochastic control via LMIs - Wang - 2010 - International Journal of Robust and Nonlinear Control - Wiley Online Library

In a recent paper, the authors showed how to compute performance bounds for infinite-horizon stochastic control problems with linear system dynamics and arbitrary constraints, objective, and noise distribution. In this paper, we extend these results to the finite-horizon case, with asymmetric costs and constraint sets. In addition, we derive our bounds using a new method, where we relax the Bel...

متن کامل

Fluid Network Models: Linear Programs for Control and Performance Bounds

It has been shown recently that uid limit models provide the means to substantially simplify the stability analysis of multiclass queueing networks. This paper begins to address the most natural second question: do uid models provide a tool for developing high performance scheduling policies for queueing networks? A linear program is constructed to give upper and lower bounds on achievable perf...

متن کامل

A stochastic version analysis of an M/G/1 retrial queue with Bernoulli‎ ‎schedule‎

‎In this work‎, ‎we derive insensitive bounds for various performance measures of a single-server‎ ‎retrial queue with generally distributed inter-retrial times and Bernoulli schedule‎, ‎under the special‎ ‎assumption that only the customer at the head of the orbit queue (i.e.‎, ‎a‎ ‎FCFS discipline governing the flow from the orbit to the server) is allowed‎ ‎to occupy the server‎. ‎The method...

متن کامل

The achievable region method in the optimal control of queueing systems; formulations, bounds and policies

We survey a new approach that the author and his co-workers have developed to formulate stochastic control problems (predominantly queueing systems) as mathematical programming problems. The central idea is to characterize the region of achievable performance in a stochastic control problem, i.e., find linear or nonlinear constraints on the performance vectors that all policies satisfy. We pres...

متن کامل

Performance Bounds and Suboptimal Policies for Linear Stochastic Control via LMIs

In a recent paper, the authors showed how to compute performance bounds for infinite horizon stochastic control problems with linear system dynamics and arbitrary constraints, objective, and noise distribution. In this paper we extend these results to the finite-horizon case, with asymmetric costs and constraint sets. The method is based on bounding the objective with a general quadratic functi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Systems & Control Letters

دوره 58  شماره 

صفحات  -

تاریخ انتشار 2009